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METHOD ABD CQK»17TSR STSTBK FOR SOFTHARE TTOXHG 
Field q£ the invention 

The present invention generally relates to electronic 
5 c^ata processing-, and more particularly, relates to 
methods, computer program products and systems for 
software tuning. / 

Background of the rnvention 

Some software products^. (e,g. , application systems, 

ID database systems, etc«} include parameter profiles that 
can be set; by specialists to achieve an optimal 
performance of the software product in a given 
envxronment of a coxitputer system. The enviroxuaent is 
determined^ for ex:ample, by the used hardware, 

15 qperating system, network data transfer speed, and many 
other factors. There are cases where the specialist hais 
to use a tiry and error procedure to determine the 
parameters where the software product performs best in 
the given environment - Typically, the parameters are 

20 part of a static configuration of the software product 
that la defined once. 

In the publication "Statistical Models for 
Automat:ic Performance Tuning" by Richard vuduc et al., 
automatic tuning systems are proposed that are based on 

25 search-based systems. The paper discloses a heuristic 

for stopping an eachaustiv^ compile- time search early if 
a near- optimal in^leraentation is found. Further, it 
shows how to construct run- time decision rules, based 
on run-time inputs, for selecting from among a subset 

30 of the best implementations. Complex statistical 
techniques are used to exploit a large amount of 
performance data collected during a search. The run- 
time decision inxleB can be costly so that the compile- 
time search may be preferable . 
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5\MBiiary of the Invention 

one etnbodlment of the invention provides • a Binrpie 
mechanism for enabling a computer program that rune on 
a computer system to tune itself without huinan 
5 interaction for achieving optixnal syeten. performance in 
a given environment at runtime. This embodiment can be 
xmplemented according to the claims i, 8, 9, and 17 
advantage of this embodiment is that siir^le comparisons 
with threshold values are used for the selection of the 
10 most suitable algorithm for a specific task instead of 
con«.lex statistical techniques. A further advantage 
Ixes in the ability to handle more-dimensional 
dependencies of the performance of alternative 
algorithms for performing the task. 
15 Another embodiment provides a mechanism to enable 

the computeor program to dynamically adjust tuning 
parameters at runtime when the environment changes. 
Thxs embodiment can be implemented according to the 
claims 2, 8, 9, aad 18. This embodiment allows the 
20 software application to recalculate threshold values of 
multiple dimensiona based on the actual perfoxiuance of 
the alternative algorithms. xf appropriate, the 
software application can use the recalculated threshold 
values for future algorithm selection. 
25 In another embodiment of the invention a data 

storage system automatically switches between multiple 
data retrieval algorithms. This embodiment can be 
xmplemented according to the claims 10 and 16 and 
provides a fast data retrieval mechanism in the 
presence of more than one parameter influencing the 
performance of the data retrieval. 

The aspects of the invention will be realised and 
attained by means of the elements and combinations 
3S ^^^T'^^y ^^"^^^^ in the appended claims. Also, 

35 the described combination of the features of the 
invention is not be understood as a limitation, and all 
the features can be combined in other constellations 
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without departing from the spirit of the invention • It 
is to be xuxdlerstood that both the foregoing general 
descriptioDL and the following detailed description are 
exemplary and explanatory only and are not restrictive 
5 of the inv«Ltion as described. 

Brief Description of the Drawings 



PIG. 1 is a siiopllfied block diagram of a conputier 
10 syst^ that can be used with an ^sibodiment o£ 

the invention; 
FIQ« 2 illustrates initialising threshold values; 
FIG. 3 illustrates dynamically adjusting threshold 
values in <»ie dimension; 
IS FIG. 4 illustrates threshold values and corresponding 
algorithms in two dimenslcns; 
FI6« 5 is a sisplif ied block diagram of an example of a 
data storage conqputer system that: can be 
operated according to invention; 
20 FIG* 6 is a diagram of a stzatic hierarchical data 

si:ructure used in one embodiment of the data 
storage system; 
FIG, 7 schematically shows the iixitial state of an 
anchor as used in the data structure; 
25 PIG. B illustrate© the use of the anchor for the 
implementation of an infoType; 
. FIO- 9 illustrates adding an InfoCell to the data 
structure; 

FIG. 10 illustrates the structure that is obtained when 
30 multiple Inf oTypes are put into the data 

structure; 

FIG* 11 shows an InfoCourse that contains data; 
FIG. 12 illustrates multiple InfoCourse paths in the 
data structure; 
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FIG. 13 illustrates how to retrievB data from the data 
storage system when operated according to the 
invention; 

FIG. 14 illustrates how two result sets can be merged 
into a single result set tdien applying the 
Boolean OR operator; 
FIG. 15 illustrates how two result sets can be merged 
into a single result set when applying the 
Boolean AHD operator; 
FIG. 15 illustrates a first iii5»leinentation for the 

result flags and the result sets; 
FIGS. 17 illustrates a second iinplementation for the 

result flags and the result sets; 
PIS. 18. illustrates how result flags relate to 
corresponding IC-anchors in the second 
intplementat ion ; 
FIG. 19 illustrates, ho.^ Boolean operators can be 
applied to the result sets in the second 
impl ementat ion ; and 
20 FIG. 20 is a simplified block diagram of software 

components of the computer system to dynamically 
select a data retriever implementation. 



15 



25 



30 



Detailed Description of the InveTl♦■^»» 

FIG. 1 shows a software application 200 as part of a 
computer system 990 that can be used with an embodiment 
of the invention. The software application 2oo uses 
parameter variables 2io that can be set to specific 
threshold values for a corresponding parameter. The 
threshold values can come from a parameter profile 
(e.g., for the second parameter parameter 2) or they 
can be calculated by the software application 210 
(e.g., for the first parameter PARAMETER i) . Por 
example, the software application can be a technical 
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software application, such as a data storage syatem 
management application or it can be a business software 
application, such as an enterprise resource planning 
application, or a custCTier relationship management 
5 application, or any other software application. 

The parameter variables 210 store information 
about the pazameters that can be used to influence the 
performance of the softwEure application 200 with 
regards to a specific task* IhB parameter variables Pl 

10 to Pn will also be referred to as variables. The 

software application iinplements various algorithms Al 
to AN for performing the specific task in different 
ways* Purthar algorithms used for different tasks may 
be implemented in the software application. For 

15 example, specific tasks can be sorting data, retrieving 
data, filtering data or any other operation pezrfonned 
on data that may depend on a parameter that has 
influence on the performance of the specific task. For 
example, threshold values that can influence the 

20 performance can be either hardware related parameters 

(e.g., the number of processors in the computer system, 
• the available main memory of the conputer system) or 
software related parameters (e.g», the main memory 
allocated to the software application, the size of the 

25 data volume, the number of hits of a qpiary, or any 
other parameter value, that can influence the 
performance of the software applicaticm) . Software 
related parameters can easily be modified by the 
software application itself, whereas hardware related 

30 parameter modification in general requires buman 
interaction (e.g., adding an additional blade in a 
blade server) in many cases. 

ThB software application further implements a 
threshold evaluator 220 and a threshold calculator 230. 
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In a first step the thresholtfl calculator 230 is 
used to calculate 410 one or more threshold values of 
the first parameter that relates to the specific task. 
Details are eacplained under PIG. 2. The calculated one 
S or more threshold values are stored 4ii in 

corresponding variables (e.g., variable Pi), m one 
alternative, multiple threshold values of the parameter 
are stored in one variable (vector variable) . m 
another alternative, for each threshold value a 
10 corresponding variable is used. 

The software application generates current values 
with respect to the parameters. For example, for 
performing a sort function for data in a list, the 
current value can correspond to the length of the list. 
A threshold value in the first variable Pi indicates 
that for a current value below the threshold value the 
beet system performance is achieved when using a first 
algorithm (e.g., Al) and for a current value above the 
threshold value the beat system performance is achieved 
20 when using a second algorithm (e.g., A2) . Xn other 
words, each algorithm covers a corresponding value 
range where the use of the algorithm provides the best 
system performance. That is, the one or more threshold 
values separate the value range of the first parameter 
25 PI into at least two intervals. 

The threshold evaluator 220 uses 420 the one or 
more threshold values Pi ^ comparing 43 0 the current 
value with the one or more threshold values to 
determine the appropriate algorithm for performing the 
30 specific task with optimal performance, m the exainple, 
tha first algorithm Al is selected 440 from the 
plurality of algorithms Al, A2, AW for performing the 
specific task. The selection is in accordance with the 
result of the comparing step 430. For easu^ple, the 
first algorithm Al is assigned to the interval that 
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includes the csurrent value* Al xb detzertiiined but any 
other algorithm A2 to AN might be selected (dashed 
arrows) . This depends on the interval where the current 
value belongs to, 
5 Once the specific task has been performed, the 

actual performance of the selected first algorithm Al 
can be measured 450 and a check is performed 460 
whether the measured performance complies with the one 
or more threshold values, that is, whether the 

10 assignment of the selected algorithm to the interval 
includi ng the current value delivers the best 
performance within the plurality of algorithms. 

The threshold calculator 230 uses the perfonnance 
measure and, in case the performance measure does not 

15 comply with the current setting of the one or more 
threshold values for the first parameter Pl^ 
recalculates 470 the one or more threshold values for 
the first parameter^ PI. The ons or more recalculated 
threshold values are then used to update 471 the 

20 corresponding variables 210. 

PIG. 2 illustrates initialising threshold values. 
Regarding the initial definition of threshold values, 
one alternative is to provide a profile parameter for 

25 each threshold value in a profile file for the software 
application 200. Profile files are commonly used, for 
example, for defuiing buffer sizes, time out 
parameters, hardware configuration parameters, or 
parameters for determining software behaviour in 

3 0 specific situations, such as error handling. For 

exaxnple, some parameters may influence the performance 
of a software application. Usually, the setting of 
profile paarameters requires a specialist who is 
familiar with the architecture of the software and has 

35 a good feeling for the way the software is influenced 
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by the parameter settings. Purthesmore, the specialist 
has to know the value ranges of each parameter, in 
practice, it turns out that the more profile parameters 
are available, the less likely it is that the 
specialist will succeed in tuning the software for 
optimal performance. 

For this reason, an embodiment of the invention 
can be used to reduce the number of profile parameters 
that have to be manually set to a necessary minimum, 
specialists working from outside the softv»are may tune 
only parameters that depend on a specific use case or 
business scenario, ,^re pra-tuning of the software is 
difficult. For example, consider the pre-tuning of 
relational database management systems, and in 
particular deciding in advance which indexes to create, 
which depends on the final structure of the various 
tables in the database system. 

in this embodiment of the invention, the software 
itself can tune scenario-independent parameters, such 
as the threshold values for the various algorithms Al 
to An. The initial values may be set during start-up of 
the software by running predefined test cases for the 
various algorithms. 

•me exaii5»le of fig. 2 illustrates the automatic 
25 determination (initial calculation 4io, cf . pio. 1) of 
a threshold value with regards to two algorithms Al 
(illustrated by bullet points) and A2 (illustrated by 
circles) . in the exaB5»le, a parameter p is tuned to a 
series of discrete values between a pair of chosen 
30 extreme values. The spacing of values between the 

extremes need not be very fine and can be equidistant. 
For each value p(x) , a measurement is made of the 
performance difference D(p(x)) (defined as runtime 
difference or any other suitable measure) between the 
two algorithms. For exanqple, the difference may be 
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defined as D(p(ae;)) = PAl(p(x)) - PA2(p(x)). For e^icdi 
dlf f erenoe D (p (x) ) , either a single measurement is made 
or t:he average of several runs can be taken. 

Below tJie tlireshold value, the performance of 
5 algorithm AX decreases steadily with increasing value 
p(3c} whereas the performance of algorithm A2 increases » 
so the magnitude of the performance difference D (p (x) ) 
between the algorithms decreases but always has the 
same sign. For exaic^ile, if D(p(x)} = PAl(p(x)) - 
10 PA2 (p (x) ) , then the difference is positive as long as 
algorithm Al has a better performance than algorithm 

The measured performance difference D (p (a) } for 
value p(a) ie the last positive difference, so p(a) is 

15 the greatest value of p such that D(p) > 0. Prom ^ralue 
p{b) onward, the difference is negative, bo p(b) is the 
least value of p such that D(p) <: O, The threshold 
vsaue lies in the interval betwe^ values p(a) and 
pCb). ' 

20 An iteration^ for exaiople based on an interval 

bisection procedure, can be used to locate the 
threshold value within the interval [p{a), p(b)]. 

For the first step of the iteration, the value 
p(a) is the left interval border, p(b) is the right 

25 interval border and p(c) a Cp(bJ - p(a)l / 2 is the 
middle of the interval. 21 new measurement I><p(c}) of 
the performance difference between the two algorithms 
is made for value p(c) . 

If the difference is positive, D(p{c)) > o, then 

30 the threshold value lies in the right half -interval 
[p(c) , p(b)] . 

Xf the difference is negative^ D(p(e)) <: 0, then 
the threshold value lies in the left half- interval 
[p(a) , p(c)] . 
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If the difference is zero, D(p(c)) « 0, then the 
threshold value is determined exactly by the parameter 
value p(c) and the iteration is ccmiplete. 

If the difference is greater than a predefined 
5 delta, the iteration continues. The half -^interval 

containing the threshold value is subdivided into two 
smaller half -intervals (which are either [p (c) , p (d) ] 
and [p{d), p(b)] or [p(a), pCd)] and [pCd). p(c)], 
depending on whether D(p(c)) is positive or negative) 
10 and the performance difference D(p(d)) is evaluated for 
the value p(d), and so on, as above. 

The procedure stops as soon as the threshold value 
has been identified with sufficient precision. This may 
depend on: 

15 The size of the measured difference D(p), for 

example, whether D(p) < delta,- for some predefined 

minimal difference delta. 

The type of the pSLrameter, for example, whether 

p(x) is of type integer or floating point. 
20 In this way, this embodiment of the invention can 

calculate initial values for all threshold values 

during start *up . 

This stazt-up calculation can last for several 

zailliseconds or even seconds before the software is up 
25 and running, 

However, the software tunes itself aut<wiatically 

and optimally on the given environment (e^g.^ given 

hardware, operating system) . It can be ea^ected to do 

so more quickly, more exactly, and more inesgpensively 
30 than a specialist could tune the software by manually 

setting profile parameters. 

FIG- 3 illustrates updating threshold values 
dynamically dtiring operation of the software 
35 application 200 (cf . piG. 1) - There can be the 
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plurality of algorlthmB Al to AN for performing the 
same task, where each algorithm is used for a 
corresponding interval. That is, each algorithm has at 
least one threshold value that represents a boundary of 
the corresponding interval. 

Assuming that the initial threshold values are 
correct at start-up and for some time after start-t^, 
in the course of time, this situation may change, for 
example, because of memory fragmentation or 
accumulating memory leaks due to bugs in the coding or 
other reasons. Therefore, after a certain time the 
software oan run under conditions that differ from 
those that prevailed inraediately after start-up. 

The performance of any of the algorithms Al to AN 
may degrade or improve by different amounts relative to 
the other algorithms. Therefore, the corresponding 
threshold values may shift in the course of time. 

This embodiment of the invention can automatically 
and regularly repeat its determination of its threshold 
values, as specified in the above calculation (cf . fig. 
2) , so as to adjust the threshold values used to switch 
algorithms dynamically during runtime. 

TO revise its determination of threshold values, 
the software application 200 makes automatic 
25 performance measurements 450 (cf. Pie. i) , for example, 
by using an appropriate time measuring coii5)ouent. For ' 
exaiii)le, the measurements can singly be records of the 
time taken for certain tasks to run. The measurements 
can be made either on an ongoing basis or from time to 
30 time. 

When using caigoing performance measurements, the 
software application 200 measures the performance of 
each execution of an algorithm. 

If the current execution of algorithm Al 
35 corresponds to a current value p(x) that is below the 
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threshold value Bl, tlie performance PAl(p{x)) of 
algorithm Al should in general be better than its 
performance at the corresponding threshold value Bl, 
since this is the reason why the software 200 executes 
5 algorithm Al instead of algorithm A2 (see FIG. 2) . 

At the threshold value Bi, the performance of 
algorithm Al is by definition the same as that of 
algorithm A2, that is, PAl(Bl) a PA2 (Bi) . 

In PIG. 3, for the algorithm An (i <: n < n) chosen 
10 • in the interval between the two' neighbouring threshold 
values B(n - 1) and Bn, the performance PAn(pc) of the 
algorithm An at the value pc (in the interval) should 
be either the same as or better than its performance 
PAn(B(n - D) and PAn(Bn) at the upper and lower 
15 threshold values B (n) and B tn - i) . 

If the performance PAn(pe) of algorithm An (middle 
arrow) is below its performance PAn(B(n - i) ) (left 
arrow) or PAn(Bn) (right arrow) at a neighbouring 
threshold value (either B(n - 1) or Bn, respectively) , 
20 then it is no longer advantageous to choose the 

algorithm An at the value pc. As a consequence, the 
checking step 4S0 (cf , PiQ. i) concludes that the 
measured performance for the algorithm does not comply 
with the current setting of threshold values. 

ISierefore, in future, at parameter value pc, the 
software application chooses an algorithm that was 
earlier measured as performing better in the 
neighbouring interval, which is either algorithm A(n - 
1) or A(n + 1) . This choice is equivalent to moving the 
threshold value B(n - x) or Bn, respectively, to a new 
position at value pc, which corresponds to the 
recalculating step 470 (cf . fig. i) . 

However, when algorithm A(n - i) or A(n + i) is 
next run at parameter value pc, it may be the case that 
the newly measured performance of the algorithm is also 
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reduced, possibly even more so than the performance of 
the original algorithm An. in this case, the threshold 
vcilne B(n - 1) or Bn, respectively, should not have 
been moved to the new position pc« 
5 This situation may occur in practice because any 

reasons for the reduced performance of algorithm An may 
also apply to reduce the performance of algorithm A(n - 
1) or A(n + 1) • 

Therefore, a situation like this can trigger a 

10 recalculation of all the threshold values, either 
immediately or as soon as practically possible, for 
example, when the system load is sufficiently low. 
Altematively, the software can generate a system 
message to warn an administrator that the latest 

15 performance measurements indicate the need for a 
recalculation of the threshold values. 

When using performance measurements from tdlme to 
time, the software recalculates 470 the threshold 
values preferably at times of low system load^ in the 

20 same way that it does during start-up for the initial 
calculation 410 (cf. PIG. 1). This recalculation may 
be defined as part of a bundle of housekeeping tasks 
that are performed at regular time intervais by the 
software. In this case, the threshold values are 

25 adjtisted with a lower frequency than when using the 
ongoing basis alternative. 

Removing as many manually set profile parameters 
as possible f rcan a profile file and letting the 
software itself tune such parameters dLnstead of a 

3 0 specialist can lead to an iznproved performance over the 
full range of parameter- values. 

However, in certain exceptional and rare 
situations, there may be good reasons why such 
parameters should not be timed by the software but from 

35 outside by a specialist. 
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Thee© eacceptipnal caBes can be handled as follows. 
By default:, profile parameters that are tuned by the 
software Itself do not appear in the profile file. 
However, if an expert escplicitly sets a parameter 
5 threshold value by entering it in the profile file, 
then the software does not change the threshold value 
of this parameter. 

FIG. 4 illustrates ' threshold values and corresponding 
XO algorithms in two dimensions. The first dimension is 

defined by the first parameter p as described in FIG. 

3. The second dinensicai is defined by a second 

parameter p". For the second pareuneter p', for exanple, 

three threshold values B' (n - 1) , B' (n) , and BM n + 1) 
15 can be stored in the corresponding variables. There can 

be any nxiinber of further parameters defining further 

dimensions . 

In the example, for each value of the first 
parameter p, two algorithms are available. In general, 

20 any number of algorithms can be available for each 

value of one dimension. For exan^jle, for the value pc 
the algorithms An and A'n can be used in the first 
dimension interval 13 {n - l),B(n)] to achieve optimal 
performance, regarding the first parameter p . Bach 

25 algorithm is represented by a corresponding rectangle 
in the drawing to reflect the coverage of the two 
dimensions. However, which of the two algorithms 
provides the best performance depends also on the 
second dimension. If the value of p' is in the second 

30 dimension interval tB" (n ~ 1), B'(n)], then the 

algorithm An is selected by the software application. 
If the value of p- is in the second dimension interval 
[B- (n) , B" (n + 1)] , then the algorithm A'n is selected. 
That is, in the case of multidimensional .performance 

3S d^endenoies the threshold evaluator compares a 
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plurality of current values of various dimenBionB to a 
plurality of correoponding threshold values and selects 
tlxe apprcTpriate algorithm for the specific task that 
provides the best perf oimance for the current 
5 combination of current values in the various 
dimensions . 

The threshold calculator can initialise the 
threshold values of various dimensions by using the 
initialising procedtire described under Pia. 2 for one 
10 dimension while values of the further dimensions are 
kept constant during the performance measurement. 

In the following, an exanple for the software 
application 200 is. a database management software of a 

15 data (storage) system that can be used together with an 
embodiment of th« invention, the data system may be 
implemented according to a relational database model. 
However, the system is not limited to use within the 
constraints of a known relational database 

20 architecture. The elements of the data system roughly 
translate to the known nomenclature of the relational 
database theory as followe (with the definitions used 
with an embodiment of the invention on the left) : 

25 InfoSystem «- Management System 

InfoArea «- Database 

InfoCluster 4- Table 

InfoType «• Attribute 

InfoCourse Data record 

30 InfoCell «. Field 

Further definitions of terms, as used hereinafter: 

Boolean operators: 
35 operators used in Boolean statements, e.g., mjd, OR. 
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Relational operators: 

operators used in relatioiial statement e.g,, 

< (less than) 
5 <ss (less than or equal to) 

> (greater than) 
>= (greater than or equal to) 
» (equal to) 
<> (not equal to) 

10 

Conditions 

relational statement comparing data, such as numerical 
data or alphanumeric data, using one or more relational 
operators . 

15 

Boolean expression: 

statement including multiple conditions that are 
ooitibined using Boolean operators. 

20 FIQ. 5 is a simplified block diagram of the computer 
system 930 that can be used with an enflbodiment of the 
invention* The computer system 330 includes multiple 
computing devices (e.g., first computing device 901 and 
second confuting device 902) that communicate over a 

25 network 999, such as a local area network (IiAN) , wide 
area network (TOVN) , the Internet, or a wireless 
network. 

For exanple, the second computing device 902 may 
be a backend system, such as a database system, a file 
30 system or an application system, that stores data. The 
data can also be stored anywhere inside or outside of 
the contputer system 990. 
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The firet computing device 901 may be used to 
CQinpQse Boolean expressions 500 to be used in a QUERY 
for retrieving selected data from the second coinputing 
device 902. For example, the first computing device 901 
may be a front end computer that provides a graphical 
user interface (GUI) to a user. 

There can be various ways in which the data 
storage system 9 02 receives the QOERY, dependent on the 
interfaces offered for the data storage system 902 . For 
example, in case of using an SAP R/3 based system, the 
SAP Remote Pvmction Call (RFC) functionality provided 
by the ABAP kernel can be Tised. An application 
programming interface (API) can be implemented as a 
collection of aeap Function Modules. The API uses the 
15 RFC functionality to comnnnieate remotely with the data 
storage system. An SAP R/3 based application uses the 
API for receiving parameters that are passed to the 
data storage system 902. •Che corresponding results are 
then returned as ABAP parameters. A selection query is 
20 filled into an internal table in ABAP and can be 

rapidly processed by the data storage system since the 
query is already pre-structured. 

In general, any interface or meta format can be 
used to post a Query to the data storage system. A pre- 
25 structured query is useful but not necessary. The query 
may also be coded in XME, or simply be passed to the 
data storage system as a string that has to be parsed 
within the data storage system. 

PiGs. 5 to 11 explain details of one embodiment of 
30 the data storage system 902. For exainple, as described 
in the patent application PCT/BP02/oio26, the data 
storage system 902 can be configured as a fast cache 
with all data structures residing in its main memory. 
The Boolean es^ression 500 can include at least a first 
portion 501 and a second portion 502, each portion 



3S 



0a-jUL-2003 10:24 SflP PG URLLDORF +49 6227 760251 S.26/'83 

2003P00397 SP 16 



representlnsr a selection cmdifclon of any clegree of 
complexity applicable to the data structures ia the 
main memory. Further portions may be incliided, Tbe 
portions are coiid»ined through logical or relational 
5 operators (OP) . 

PI6. 6 is a diagram of a static hierarchy structure 
used in one embodiment of the data storage syst«n 902. 
Each box in the structure corresponds to an instance of 

10 the data type that is used as a label for the box. 
Itoltiple overlappdLng boaces illustrate multiple 
instances of the same data type, A single arrow between 
instances of different data types stands for an 
arbitrary number of arrows between multiple instances 

15 at each corresponding level of the structure. In the 
following, the data type labels are used to refer to 
corresponding instances of the data type. The highest 
level in the structure is the InfoSystem level. Down 
from the top level one or more InfoAreas are eoimected 

20 to the InfoSystem. The InfoSystem provides algorithms 
necessary to operate the data storage system in run 
time. The InfoSystem is connected to any number of 
InfoAreas through a linking element, which will be 
described hereinafter as an anchor. These InfoAreas can. 

25 for example refer to logical units of the InfoSystem. 

Bach InfoArea is connected via a linking element 
(again an anchor- as described hereinafter) to an 
Infocluster. In turn, each InfoCluster is connected to 
at least one InfoCourse and at least one InfoType, 

30 through respective linking elements, such as anchors. 

The infoType can be seen as an attribute of a table,- an 
InfoCourse starts always in an InfoCluster. If an 
InfoCourse stays within an InfoCluster with its 
addressed InfoCell elements corresponding to fields of 
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a table, then the infoCourae 10 similar to a record of 
a table, such as a relational database table. 

Under the XnfoCourse and the ZnfoType the InfoCell 
is found; this is the element on the lowest level In 
5 the hierarchical structTire. On the creation of an 

infoType an anchor is created that is an Infocell also. 
This anchor has the fxmction to represent the structure 
of following InfoCell elements. 

For the implementation of the levels below the 
10 infoArea level, i.e. the infoCluster, the Infocourse, 

the InfoType, and the InfoCell levels, use is made of a 
data element according to the invention as shown in 
FI6. 7. m this eacainple, the data element is shown 
sdhematically as an anchor, and is provided with a 
15 nundier of pointers. The pointers of the first pair are 
labelled I.VR and RVR (Left Vertical Ring, respectively 
Right Vertical Ring) , the pointers of the secood pair 
are labelled UIR and bhr (Left Horizontal Ring, 
respectively Right Horieontal Ring) , the pointers of 
20 the third pair are labelled LSR and RSR (Left self 
Ring, respectively Right Self Ring) , and the single 
pointer is labelled IP (InPormation bridge) . Note that 
the pointers LSR, RSR and IP are in principle optional. 
Further pointers my be used. In the initial state, 
25 as shown in PIG, 7, all pointers point to the anchor. 
This initial state is also the simplest of possible 
ring structures. Every pointer in the structure has a 
valid address, and cases of a non defined pointer (nil 
pointer) are avoided. 



30 



In the following, example data is used as shown in 
table A of PIG. 12. The table includes data regarding 
fixst names, ages and weights, por this table an 
InfoCluster is generated. Furthermore, three InfoTypes 
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are generated to represent respectively first names, 
ages, and weights* 

PIG. 8 illustrates the use of the data element for the 
5 Implementation of the infoType. In the infoType, 

semantic information is included, such as^ the data 
type (in this example '^IMTBQBR") r field name (in this 
example ^age" ) ) , etc . The InfoType has an anchor 
associated with the InfoType. The anchor points with 

10 its RVR pointer to the actual information carrier, that 
is the InfoCell. The Infocell is as described ehove the 
lowest level entity within the data system. The 
InfoCell holds the information, as shown in PIG, 8; in 
this exanqple ^age is 3 0 in INTEGER'^. 

15 As described above, the InfoCell is provided with 

a LVR/RVR pointer pair* As shown in PIG, 8, the RVR 
pointer of the InfoCell points towards the anchor, and 
also the LVR pointer points to the anchor. As a result, 
the ring configuration of the anchor is maintained. 

20 

FIG. 3 illustrates how a further InfoCell ts added to 
the data structure. The InfoCell (with the value ^25'') 
is inserted in the LVR ring after the first InfoCell- 
The LVR and RVR pointers of the InfoCell point to the 

25 anchor, as to maintain a closed ring. 

The order in which the infoCells are organized 
depends on their value. In case of a smaller value, the 
InfoCell is ordered in on the LVR side, otherwise on 
the RVR side. This practise is well loaown in the art as 

3 0 binary tree building. Preferably, the binary trees are 
organized as balanced or AVL trees, methods which are 
well known in the art. These kinds of trees minimiee 
the number of levels within the tree structure, so as 
to minimize access time. Preferably, all tree 

35 structures within the data systfsm are dynamically 
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balanced in use, so as. to guarantiee optimum access 
times. 

PIG. 10 illuBtrateB the structure tlxat is obtained wben 
5 all InfoTypes of the table A are put into the data 

etrueture. in total, three InfoTypes are present; age, 
first name, and weight. Note that the end pointers of 
each last element in the respective trees are not 
shown, tinder each anchor of the InfoType, the infoCells 
10 are organised in a binary tree. The inf ©Cluster points 
to an anchor which in turn points to a first InfoType. 
The first InfoType in turn points to the other two 
mf ©Types. Bach InfoType points to an anchor. The 
anchor has the additional function of a marker, that 
can be used by an access or query process as a break or 
return sign. 

TO complete the ia^ilementation of the table, the 
relations between the InfoType have to be made. To this 
end an InfoCourse Is introduced. 



15 



20 



25 



PIG. 11 shows the InfoCourse that contains the data, for 
a row of the table A. Use is made of the LHR and RHR 
pointers. The end pointers again point back to the 
anchor of the InfoCourse to maintain the ring 
structure. Note that the InfoCourse also forma a binary 
tree, sorted by the id numbers of the InfoTypes. Note 
that the ID numbers of the InfoTypes are unique. For 
eacample, integer values are used for the ID numbers. 

30 PIG. 12 illustrates all the InfoCourse paths (for 

eacanqple implemented using pointers) for the table A. 
Note that all mfoCells have been provided in the t<^ 
section with their respective InfoType id number, over 
which the binary tree configuration of the InfoCourse 
via the IfiR/RHR pointers is organized. Elements that 



35 
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belong to an InfoType are ewmected by solid arrows. 
Elements tHat belong to an InfoCourse are connectefl by 
dasHed arrows. 

When five million records with 10 o attributes 
5 (e.g., 100 columns of a relational database table) are 
loaded into the data storage system 902 , then five 
million InfoCourse trees (InfoCoiirsee) exist, one for 
each record • Each InfoCourse includes 100 nodes. Bach 
InfoCourse has a corresponding InfoCourse anchor 
10 pointing to the respective InfoCourse. In other words r 
«Aien loading five million records into the data storage 
system 902 then also five million InfoCourse anchors 
exist. 

15 PIGs. 13 to 15 explain a data retrieval mecOianism as an 
example of a specific task that can be implemented by 
multiple algorithms, PIG- 16 explains a first 
implementation of the data retrieval mechanism and 
PIGs. 17 to 19 explain a second implementation. Each 

20 implementation is sxiitable for a corresponding 
parameter value range (number of hits) . 

PIG. 13 illustrates how a computer implemented method 
can be used to retrieve data from the data storage 

25 system 902 when operated according to the invention « It 
is assumed that the data storage system 902 stores the 
data using the data structure as described in FIGs. 2 
to 8. Note that in this data stmeture each InfoCourse 
300 r 301, 302, 303 has an InfoCourse anchor 310, 311, 

30 312, 313. 

Once the Boolean expression is received by the 
data storage system 902, a parser decomposes the 
Boolean eaqpression 500 into the first portion 501 and 
the second portion 502. If fxirther portions are 

35 included tliey are also subject to decomposition. Each 
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portion includes at least one condition that has to be 
fulfilled by any InfoCouree that is selected by the 
original query. The conditions relate to InfoTypes, 
The data storage system 902 then determines a 
5 result set for each portion. In the exainple, a first 
result set 361 includes result flags (C^PLAGl) in 
compliance with the first portion 501 and a second 
result, set 362 includes fxirther result flags (C-PIAG2) 
in conipliance with the second portion 502, a result 

10 flag is used to indicate whether a specific ZnfoCourse 
fulfils a condition in a corresponding portion. Each 
result flag relates (bold up-arrows) to a result 
identification nuznber 351, 352 of the corresponding 
result set 361^ 362, where it belongs to. The result 

15 flags within a result set are also interrelated (dashed 
arrows) . Further, each result flag relates (bold left- 
arrows) to the corresponding XnfoCourse anchor (IC- 
anchor) 310, 311, 312, 313 of the Inf ©Course fulfilling 
the corresponding condition • 

20 The two result sets 361, 362 can. originate from 

the evaluation of a complex Boolean eacpression, where 
the first result set 361 can be the result of one 
bracket including potentially any Boolean sub- 
expression as sub-query. The same is true for the 

25 second result set 362, e.g. representing another 
bracket of. the Boolean es^ression. 

FIGS. 13 and 14 illustrate how the first and the second 
result sets 361^ 362 can be merged into a single result 

30 set 363 when applying corresponding Boolean operators 
to the result flags of the corresponding JnfoCourse 
anchors, one Implementation of a data retrieval 
algorithm using pointer lists is explained in more 
detail in PIG. 16. Another implementation using bitmaps 

35 is eacplained in more detail in PlGs. X7 to i^. 



0B-JUL-2003 10:25 SRP AG UnLLDORF 

2003F0Q397 EF 24 



+49 622? 7S0251 S.32^ 



In the example of FIG. 14, the Boolean e^qpresslon 500 
combines the first and second result sets with a 
Boolean OR operator. 
5 For the combination, the InfoCourses or the 

anchors are not needed anymore. The number o£ result 
flags in each result set Is known by, for example, 
incrementing a corresponding counter when creating the 
result flags. 

10 In one implementatlCTi, the data storage system 

rxins through one of the result sets from the first to 
the last result flag. Advantageously, the result set 
Including the lowest number of result flags is chosen 
because of a shorter processing time, which becomes 

15 more relevant in the case of Boolean AND combinations - 
The first result set 361 includes three result flags 
(C-FIAGl) and the second result 3S2 set includes two 
result flags (G-FIiAG2) - Therefore, the data storage 
system starts with the second result set 362 and then 

20 processes the first result set 361. in one 

implementation, for each IC-anchor where a result flag 
C-FIAGl or C-FIAG2 relates to, a corresponding result 
flag R-FIAS is generated in the third result set 363 
with having result identification number 353. 

25 In another implementation, one can use also the 

first or second result set for storing the result of 
the Boolean OR operation. 

For example, when running through the second 
result set 362, each C-FIiA62 can be "renamed" into 

30 C-PIAGI. The result ID 352 of the second result set is 
set to the result id 35i of the first result set. 

Further, it is checked whether a corresponding 
C-FlAGl result flag exists. If not, the data storage 
system proceeds with the next C^FriAG2 in the second 

35 result set 362. If a corresponding C-FlAGi exists, then 
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one of these two result flags, either in the first or 
in the second result set, is deleted to avoid 
intersections , 

To find out, whether a corresponding G-PLAGl 
5 escists din the first result set 3 6x, for example, the 
data storage system moves along a circular structure 
that is used to relate the result flags to their 
corresponding ic^anchor. 

After having processed all result flags of the 

10 first and second result sets, only C-FXiHGl result flags 
r^iain. The combination with OR means to link the two 
result sets together to one result set. Xn this 
exanqple, in the end, all result flags have the result 
ID 351 of the first result set 361, During the above 

15 described procedure the Counters for the ntamber of 
result flags in each result set are continuously 
updated (e.g., decremented when result flags are 
deleted) . Therefore, the number of result flags in the 
"final" result set is the sum of the counters of the 

20 first and second result sets just prior to lining them 
together • This count result can be reporteci to an 
application as the number of ZnfoCourses (records) 
matching the first portion 501 or the second portion 
502 of the Boolean expression 500. 

25 The "final" result set may represent a real final 

result set or an intermediate resiolt when the Boolean 
escpression 500 includes further portions. In this case, 
it is combined again with further result sets that 
correspond to the further portions. A contplex query 

30 consisting of several nested s\2b-queries may be 

evaluated recursively by conibining the result sets of 
sub-gueries with the result sets of other sub-queries. 
This contixxues until all levels of the Boolean 
expression are resolved. At the end, a sdLngle result 

35 set (e.g., result set 363) is left. Its number of 
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rcsixlt flags corxeeponds to the nuinber of hits for the 
whole query (Boolean expression 500) . 

In the example of FIG. 15 ^ the Boolean ej^reseion 500 
5 combines the first and second result sets with a 
Boolean AND operator. 

Again, the data storage system Jaxows the number of 
result flags in each result set from the corresponding 
result counters and starts with processing the result 
10 set with the lowest number of resiilt flaigs. This is 
advantageous in the case of Boolean ABP combinations 
because the total number of result flags can only be as 
large as the smallest result set. In the example of 
PIG, 13 the second result set 362 is the smaller one. 

15 

For each IC-anchor, where a result flag C-FLAGl 
and also a result flag C-FIiAG2 relate to, a 
corresponding result flag R-FIAG is generated in the 
third result set 363. 

20 In one implementation, one can use also the first 

or second result set for storing the result of the 
Boolean AND operation. 

For each result flag C-FIiAG2 of the second result 
set 362,. the data storage system ehecfce whether a 

25 corresponding result flag C-PLAGl exists in the first 
result set 361. if so, the result flag C-FIiAG2 is the 
data storage system proceeds with the next result flag 
of the second result set. If no corresponding result 
flag C-FIAGl is found in the first result set, then the 

3 0 result flag C-FIAG2 in the second result set 362 is 
deleted « 

At the end of this filtering procedure, the second 
result set 362 includes the "final- result set and, 
therefore, plays the role of the third result set 363. 
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The first result set 361 Is not needed any more axid can 
be deleted. 

Ag^ain, with each deletion of a result flag, the 
corresponding counter is reduced accordingly. 
5 Therefore, the counter of tHe second result set always 
contains the current nuraber of result flags C-F£iA.G2, 
which, at the end of the filtering procedure, 
corresponds to the nuxnber of hits for the query and may 
be reported to an Application. 
10 As in the Boolean OR case, the t^finaJL" result set 

may represent a real final result set or an 
'intermediate result when the Boolean expression 500 
includes fuarther poxrtions that are subject to further 
combinations » 

15 

FIG. IS illustrates a first implementation for ttie 
result flags C*FLAG1, C-FIAG2 and the result sets. 

In this first inplementation, the data storage 
system instantiates an instance (C-£i»L61, C-FIAS2) of a 

20 result flag class for each result flag. Multiple result 
flags for one infoCourse 300 (record) are cozmected in 
a ring structure 800. The ring structure 800 relates 
330, 320 to the corresponding IC-anchor 310. 
Advantageously, a docket element (D-PLAG) is used. The 

25 docket element represents a counterpart of the IC- 
anchor 310 on the side of the result flags. One 
a<3vantage is that the docket element is decoupled from 
the IC-anchor in the sense that it is derived from a 
different class than the IC-anchor. Therefore, it can 

30 provide different functions than the IC-anchor. These 

functions can be used by the other result flags because 
the docket element is instantiated from the same class 
as the result flags* The decoupldLng allows instances 
from the result flag class to consume less memory than 

3S a corresponding IC-anchor that has, for exansple, more 
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pointers, additional administrative infoxmation (e.g., 
nuniber of elements in a substructure) , methods that 
operate on attributes, such as '"sort elements" or 
"balance tree", etc. The docket element P-FIAG has a 
docket pointer 330 pointing at the corresponding ic- 
anchor 310, whereas the IC-aadbor 310 has an anchor 
pointer 320 pointing at the corresponding docket 
element. Using the ring structure 800 the data storage 
system can quichly identify any result flag related to 
a specific IC-anchor. . 

To summarize, each InfoCourse has one IC-anohor 
that relates to a corresponding docket element. That 
is, an InfoCourse (record) 300 is represented by an IC- 
anchor 310 and the corresponding docket element D-FIAG. 
15 The docket element is the docking point for the result 
flags C-FLAGl, C-FIAG2. A result flag semantically 
plays the role of a dynamic flag, if a result flag, is 
connected to a docket element, the InfoCourse, which is 
represented by the docket element, has been selected. 
That is, it fulfils one ore more conditions of the 
original query. 

Bftiltiple result flags that relate to different IC- 
anchors may be linked together in a pointer list by 
means of pointers (e.g., pUp and pDo«n) . This is also 
valid for the docket elements, since technically 
speakiag they are also instances of the result flag 
class, A linear list of result flags is called a result 
set. Bach result set is identified by a result ID. a 
result set flags a subset of infoCourses that comply 
3 0 with at least portion of the Boolean ea^pression SOO in 
the query. 

In the example, the first result set 36i is 
implemented by the first pointer list PL-1 that 
includes the result flag pointers C-ELAGl and has the 
35 result ID 351. The second result set 3S2 is implemented 



20 
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by the seoona pointer list pl-2 that includes the 
reeult flag pointers C-FIA62 and has the result ID 352. 
The docking elements formally are also lixAed in a 
pointer list PL-D having its o«a result id 350. 
5 Several result sets may exist simultaneously. On 

the level of a docket element D-WIAB, the result flags 
can be linked in the circular structure 800 using 
pointers, such as pointer pSmallld and pointer 
plArgeZd. The pointer names indicate that the result 

10 flags in the circular structure 800 can be sorted by 
result ZD, The circular structure soo can be run 
through in both directions, e.g. to find the result 
flag of a particular result set. Sorting the resrolt 
flags in the circular structure 800 by result ID helps 

15 to decide in which direction the circular structure 
should be searched for a fast identification of a 
certain result ID. 

PIGb. 13 and 14 descrdlbe an in^Jlementation for 
applying the Boolean OR and JVND operators to two result 

20 sets. These operators may be combined with a Boolean 
NOT operator. In this case, the data storage system 
runs through the docket elements of all IC-anchors and 
instantiates a result flag in a new result set each 
time when there is no res\at flag in the original 

25 result set where the NOT operator is applied to. At the 
end of the procedure the original result set is uot 
used any more and can be deleted. Note that the 
InfoCourses are not needed to perform the inversion. 
Only the IC-anchors are used. 

3D The nurtber of hits as well as some or all of the 

Infocourses that match the query may be retumed to an 
application. As sux example, aBsume that 20.000 
InfoCourses are found. That is, the final result set 
contains 20.000 result flags. If an application 

35 requests the next 20 InfoCourses after the 5.390th 
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Inf ©Course from the data storage system, then the 
request can be satisfied by using the final result set. 
The result flag 5.390 (offset) is located by running 
down the final result set and counting the result flags 
5 until the offset result flag at position 5.390 is 
reached. The next 20 InfoCourses are read from the 
corresponding tree structures (e,g-. ^ by using the IC« 
anchors that relate to the corresponding docket 
elements) - The retrieved values may be serialized, for 

10 example^ into a Send-Buffer- Structure or any other kind 
of appropriate coraraunication data structure. Any type 
of transport format and/or rearrangement, 
concatenation, etc. of data may be used for the Send- 
Buf fer-Stmicture (e.g., the use of fixed lengths). 

15 Preferably, the application knows the data format 

provided by the data storage system to ensure stable 
communication • 

For a fast localisation of a specific InfoCourse 
(e.g., niuiiber 5,390) it is useful to siabdivide a result 

20 set into Intervals. One can use an interval pointer 
which points to the res\2lt flag In the middle of the 
result set (e.g., result flag 10.000 of 20.000) or to 
any other sub^interval o£ the result set, such as 
quarters. According to the offset requested by the 

25 application the data storage system can jump to the 
nearest interval pointer axid then sequentially run 
through only a part of the result set (e.g., upwards or 
^ downwards) and count until the requested result flag 
(e.g., docket element D-PLAQ) has been reached. It is 

30 useful to choose the direction having the shortest 

distance to the requested offset position. For example, 
assume that there are 20.000 result flags dLn the result 
set. If InfoCourse 15.390 is requested as an offset and 
no interval pointers are available, then it ia 

35 advantageous to start at the bottom of the result pet 
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(result flag 20.000) and run through 20.000 - 15.390 + 
1 = 4.611 result flags instead of starting at the tpp 
and run through 15.390 result flags. The same is true 
when using interval pointers. 

For exanipie, the above describe implementation may 
correspond to the algorithm Jto (cf. FIG. 4). To achieve 
the best data retrieval performance, it can be 
advantageous to use this algorithm An when the current 
value of the number of hits (a.g,, pc; cf. fig. 4) is 
belowr the threshold value B (n) (cf. fig. 4) in the 
-number of hits- dimension and the complexity of the 
Boolean expression is in the interval fs' (n - 1), 
B' (n)] of the second dimension. The second dimension 
parameter p' in FIG. 4, therefore, reflects the 
congplexity of the Boolean statement in this exaii5)le. 
The threshold value b« (nj may be defined through a 
Boolean AND eacpression including a single condition 
portion ( single-conditioa-Boolean-AHD" ) . 

m the previously explained general result flag 
instance based in^lementation, the Boolean AND operator 
was applied to two result sets, m another 
implementation a "Lean AND- can be implemented in case 
only one result set exists as a result of one portion 
of the Boolean expression and this result set is to be 
combined with a single condition through a Boolean AND 
operator (Boolean and «5»ression) . The Query may have a 
syntax like.- (<cQmplex Subquery^) AND condition Ci. 
Also for multiple non-nested conditions combined with 
AND at the same bracket level the -Lean AND- can be 
used. A syntax example for this kind of fiat Boolean 
expression is, CI AND C2 AND AND Cn, where Cn are 
oouditions . 

ThLs "Lean AND" in^^lementation may, for exairole 
correspond to the algorithm A'n (cf . FIG. 4j To 
achieve the best data retrieval performance, it can be 
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advantageous to use the algorithm A^n at the "number of 
hits** value pc when the coniplexity of the Boolean AND 
expression is below a certain threshold value (e.g.. In 
the second dimension interval [B*(n), BMn-i-l)]« cf. 
5 FIG. 4) . 

In the above examples, only one result set exists 
and one or more conditions are to be combined with the 
Boolean AND operator. 

Assume a Boolean expression, such as: 

ID CI AND C2 AND ^ AND Cn. 

As e3cplained before, the data storage system 902 
is able to qulclcly find out which of the conditions Cl 
to Cn has the lowest number of hits, that is, the 
highest selectivity. The total number of hits in the 

15 intersection set of all conditions cannot be larger 
than the number of hits for the condition with the 
highest selectivity. 

Therefore, the data storage system creates a 
result set for the condition with the highest 

20 selectivity, then runs through all result flags of the 
result set and checks for each result flag if the 
remadLning conditlonB are fulfilled by the corresponding 
InfoCourse or not, in this inplementation the 
ZnfoCourses are needed to check, for example, a 

25 condition NAME_FIRST = ^ Peter' . The data storage system 
uses the relation from the result flag through the 
docking element to the corresponding ic-anchor, which 
points at the corresponding InfoCourse. The 
corresponding InfoCourse tree is then searched for the 

30 InfoType values according to the remaining conditions. 

In this implementation, a second result set is not 
needed to be checked against the result set because the 
checking is directly performed on the related. 
InfoCourses. As a consequence, the time to instantiate 

35 all the result flags of a second result set is saved by 
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directly searching the Inf oCoursee matching the result 
set (already the most selective Condition) and checking 
directly if the corresponding values match or not. 

For each result flag this check is performed for 
5 one or more conditions. For example, in a query CI AND 
G2 AND C3 AND C4, a result set is instantiated for the 
most selective condition and for each result flag the 
three other conditions are checked accordingly- If at 
least one condition does not match^. the corresponding 

10 result flag is deleted from the result set and the 
result counter is adjusted accordingly. 

Finally, the result set flags all matching 
InfoCourses (records) and the result counter has the 
correct number of hits, which may be reported to an 

15 applicatlOT. 

For example, the "Lean AND** implementation can be 
advantageous when the time to instantiate the result 
flag instances of the second result flag exceeds the 
time to check the corresponding InfoCourses, 

Boolean expression has only one single 
condition, result sets are not necessary. Result sets 
become valuable in case of a combination of several 
conditions in a corresponding query, in the particular 
case of only one condition the co\int result for the 

25 total number of hits is obtained by means of the tree 
structures as described in FlGs. 5 to ii. if 
InfoCourses have to be returned to an application, this 
can also be done by using the tree structures. Instead 
of a result set identifying the matching InfoCourses 

3 0 the tree nodes as such are used. For exan5)le, instead 
of running through a result set to visit all matching 
InfoCourses and collect the data into a Send-Buf fer- 
Structure, only the matching sub- tree structure© 
identified by the corresponding start pointers are 

35 traversed. As soon as the offset InfoCourse is found, 
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only the nunsber of InfoCourseB has to be visited tha.t 
±3 to be returned. For exainple, when 10 ZnfoCouraes 
have to be rettamed to an application, only 10 nodes 
from the offset Xnf oCourse on have to be traversed in 
5 the corresponding infoType sub -tree. From each node in 
an InfoType trae the corresponding anchor object can be 
reached, and from the anchor each attribute value of 
the given XnfoCourse can be reached, 

10 PlGs. 16 to 18 illustrate a second implementation for 
the result flags C-FLAGl, C-FIAG2 and the result sets 
leading to a second algorithm for the data retrieval 
task. 

This second implementation is appropriate for very 
15 large result sets, where the first implementation would 
require many result flag instances eating up a lot of 
memozy space of the data storage system. 

PIG, 17 illustrates three bitmaps BM-n^ BM-n+l, BM-n+2. 
20 In the example, the start of bitmap BM-n coincides with 

the basis addzress of bitmaps in the memory of the data 

storage system. 

For example, a first bitmap BM^n corresponds to 

the first resiilt set 3^1 and a second bitmap BM-n+2 
25 corresponds to the second result set 3S2, The result 

flags C-FIAGl, C-PIAG2 are implemented as bits in the 

respective bitmaps. 

A bitmap in general consists of multiple machine 

words. Depending on the hardware architecture of the 
30 data storage system, a machine word can consist, for 

example, of 32 or 64 bits. The second implementation 

also works with any other machine word length, such as 

X2 0 bit or more. A bitmap is a contiguous concatenation 

of machine words in a sufficiently large area of the 
35 data storage system memory, The niunber of bits in a 
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(result set) bitmap corresponds to the nuniber of IC- 
anchors of the Inf ocoxirses selected by the Boolean 
expression 500. Therefore, each bitmap has the roaxdLmum 
size of a result set. Multiple bitmaps can 
5 siimiltaneouBly exist in the memosry* Each bitmap (result 
set) is id^tified by a corresponding result ID. Bach 
result ZD points to the start address of its 
correspoziding bitmap. For example, the first result XD 
351 and the second result ZD 352 point to the start 
10 address of the first bit map BN-n and the second bit 
map BK-n'f-2, respectively, 

Assiome, 5 million records (Xnf oCourses) are loaded 
into the tree structures of the data storage system. 
Therefore, 5 million IC-aixchors exist and, in this 
15 example, one bitanap includes 5 million bits (one Bit 
per IC-anchor) - The bitmap occupies 5.000.000 / s = 
625-000 bytes = 610 KB. The 5 million bits correspond 
to 5,000-000 / 64 = 78,125 machine words on a 64 Bit 
hardware platform and to S.OOO.OOO / 32 = 156,250 
20 machine words on a 32 Bit hardware platform. This 

example shows that a bitmap can be made up of tens of 
thousands or even more machine words. The number of 
machine words in a Bitmap is only physically limited by 
the size of the available main memory and the 
25 addressability of the main memory. 

That is, each Bitmap may consist of a 
theoretically unlimited number of machine words, where 
the length of a machine word depends on the given 
hardware platform and/or the operating system of the 
data storage system. Each bitmap is referenced by a 
result ID. Preferably, the result IDs 351, 352 are 
stored in a tree structure allowing direct access to 
the start address of the corresponding bitmap via a 
pointer- In general, any structure (e.g,, a linear 
list) can be used to administrate the result IDs. 



30 



35 



08-JUL-2003 10:27 SAP AG WflLLDORF +49 6227 760251 S. 44/83 

2003P00397 EP 56 

However, for large numbers of result IDs the access to 
a specific res\zlt set ±b more efficient when using a 
tree structure than when using a linear list or another 
structure - 

5 When the start adOr^&B of a specific bitmap has 

been found, this specific bitmap can be used to count 
the number of hits (number of result flags) or to 
return data to an application. 

In the second implemehtationr each bitmap has a 

10 counter counting the Number of result flags, that is, 
the number of bits set to 1, To count the number of 
hits, the data storage system can run through all 
machine words of the bitmap. If a machine word has a 
value of zero, then all bits of the machine word are 

15 zero and the next machine word can be checked. For 
machine words having a value different from zero the 
data storage system determines the number of bits that 
are set to 1, This can be achieved by known methods, 
such as, shifting the bits of a machine ward into one 

20 direction, testing with bit masks perfoxjning a bit by 
bit AKD operation, etc. Each time a bit is set to 1, 
the counter is increased by 1. At the end of the 
procedure the counter value corre^onds to the number 
of bits set to l and, therefore, the number of result 

25 flags in the corresponding result set. 



FIGp IB illustrates how result flags relate to 
corresponding xc-anchors in the second implementation 
using memory mapping. The shown bits in the bitmap 
30 memory represent only a portion of the bitmap memory 
area. 

In contrast to the first iiiplementation, bits of a 
bitmap and the corresponding IC anchors are not linked 
by pointers. 
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However, IC-anchors and their corre3poxidlng bits 
are related by a memory laapping rule using relative 
addresBes. A memory manager of tbe data storage system 
can ensure that the IC- anchors and the bitmaps reside 
5 in contiguous memory areas. The data storage system can 
then locate any IC- anchor that relates to a specific 
bit in a bitmap. 

For retrieving data (InfoCourses) in response to a 
query the data storage system identifies the 

10 corresponding ic-anchors* Using the XC-anchor and the 
corresponding InfoCourse tree a specific node in an 
inf oType tree can be found and the value can be read 
from the node. The value can then be copied, for 
example^ to a Send-Buf fer-Structure as described 

15 earlier. To locate the specific bit that corresponds to 
the identified XC-anchor the data storage system can 
use an algorithm that works with relative addresses. 

A specific bit is part of a machine word. Assume 
that this specific bit is bit number The machine 

20 word has a memory address MWA. The whole bitmap has a 
start address SA, For eacanqple, the relative address of 
the specific bit is calculated as BA » (MWA - SA) * S4 
+ K for 64 Bit long machine words and BA = (MWA - SA) * 
32 + K for 32 Bit long machine words. At this memory 

25 location, the specific bit of the Bitmap can be 

checked. If it is set to 1, the InfoConrse with the 
corresponding IC- anchor is part of the result set* 

The IC-anchor can be found in the IC-anchor memory 
area in the following way. All IC- anchors reside in the 

30 ic^anchor memory area with the basis address C. The 
size AS of an IC-anchor is known- Therefore, the IC- 
anchor address AA of the specific IC-anchor can be 
calculated aBAA=C + BA*AS, Therefore, a pointer 
that is set to the address AA points to the requested 

35 IC-anchor* 
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When creating a result set ];>itinap tn ccxmpliance 
with a portion of the Boolean expreeslon, the result 
flag bits that relate to ZC-anchors o£ the 
corxesponding InfoCour&es are set to X,. This can also 
5 be achieved by using relative addresses. 

Bach IC- anchor has a memory address AA. The basis 
address of the IC- anchor memory area is C. By knowing 
the size AS of an IC-anchor, the ZC-anchor position 
number can be calculated as BA » (AA - C} / AS. That 

10 Is, the result flag bit for the BAth XC-anchor is to be 
located in the bitmap memory area. The start address of 
a specific bitmap (result set identified by result ZD) 
is SA, The machine word address where the bit is 
located is calculated as MAW » 5A + BA div 64 on a 64 

15 bit hardware platform and » SA + BA div 32 on a 32 
bit hardware platform, where the div operator divides 
one integer number by another integer number and 
returns the integer part of the result * Within the 
identified machine word at HAW the Kth bit has to be 

20 set to 1 with K s BA mod €4 on a 64 bit hardware 
platform and K BA mod 32 on a 32 bit hardware 
platform, where the mod operator divides two integer 
n\imbers and returns only the remainder. Alternatively, 
K could aileo be calculated as: 

25 K = BA - (MAW - SA) * BS, where 2^ is the addressable 
number of bits in the used hardware platform/operating 
system. 

PXG- 19 illustrates, how AND/OR/NOT operators can be 
30 applied to the first and second bitmap BM-n, BM-n-i-2 by 
sequentially combining the corresponding pairs of 
machine words. The machine words are illustrated by 
cycles with a number that corresponds to the position 
of the maoliine word within its bitmap. 
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Machine word 1 of the f iret bitmap BM~n is 
combined with machine word !• of the second bitmap 
BM-n-i-z, This XB repeated for all pair© of machine words 
{2,2'), {3^3»), and sq on^ with respect to the first 

5 and second bitmaps- since all IC- anchors are 

represented by e corresponding bit in each of the 
bitmaps r bitmaps have the same size and, thus, 

contain the same nuxtCber of machine words 

The logical combination of pairs of machine words 

10 by applying rhe Boolean AND or OR operators can be 

performed as a bit by bit AMD or OR operation. Usxially, 
the CPU (processor) can perform this in one processdLng 
cycle. Programming languages, such as C^+^ offer 
commands for this kind of bit by bit operations . 

15 The result of the combination of first and second 

result set bitmaps may be written to a new, third 
bitmap (e.g., BN-n-M) or to one of the two original 
bitmaps rai-n, BM*n-i-2. This depends on whether the 
original bitmaps may be oveirwritten or are to be kept 

20 for later use. 

After the processing of each pair of machine words 
the result flag counter counts how many bits are set to 
1 in the resulting machine word. The sum of t:he 
counting results for all machine words of the resulting 

25 bitmap corresponds to the total nTunber of bits set to 1 
in the resulting bitmap and may be reported to an 
applicatiion as tihe ntunber of hits* 

The application of the Boolean NOT operator to a 
bitmap is performed as a bit by bit not operation 

30 applied to each machine word in the bitmap. Again, the 
result may overwrite the original bitmap or can be 
written to another bitmap if the original bitmap has to 
be kept for later use. 

The so far described bit map implementation may 

35 correspond to the algorithm A(n 4- l) (cf • FIG. 4) chat 
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has a better performance tlian th& algoritbxa An in case 
the number of bits pc exceeds the threshold value B(zi) 
and the conqpleacity of the Boolean expression is in the 
interval IB« (n - 1), B'(n)]. 

5 

The ^Lean KSD" , as described under FIG. 16 r only- 
needs one result set and can also be implemented when 
using bitmaps. The bit map ^Iiean AND" implementation 
may correspond to the algoritlmi A* (n + 1} (cf * FXG. 4) 

10 that is to be used when the complexity of the Boolean 
JMStD expression is in the interval [B< (n) ^ B* (n + 1) ] in 
the complexity dimension and the cuxrent number of hits 
is in the interval [B(n) , B{n ^ 1}] . 

For e^cample, the Boolean expression includes five 

15 conditions that are combined by Boolean AND operators: 
Cl AHD C2 AND C3 AND C4 AND C5. A result set bitmap is 
set up (as described in FIGS* 13 , 14) for the condition 
with the highest selectivity of all conditions included 
in the Boolean expression* Then the data storage system 

20 runs through the bitmap from the first to the last bit. 
For each bit that is set^ to 1 the data storage system 
jumps to the corresponding InfoCourse and checks if all 
other Conditions are fulfilled by the corresponding 
InfoCourse. This check is performed in the same way as 

25 described in the implementation using result flag 

instances (of. FIG- 16) . If all conditions are true the 
bit keeps its value 1, otherwise the bit is set to 0, 
When a bit is set to 0^ the corresponding result 
counter containing the number of bits that are set to 1 

30 is reduced by 1. Therefore, the result counter contains 
always the current n\unber of hits. 

Alteraati-vely, InBtead of getting the number of 
hits from the initial bitmap and reducing the counter 
each time a bit is set to 0 when an InfoCourse does not 

35 match the other conditions, the counter may also count 
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the number of bits that are set to l in the bitmap 
a£ter the <>Lean AmD" has been applied. 

FIG, 20 is a simplified block diagram of software 
5 components of the computer system 3B0 that can be used 
with an embodiment of the invention to dynamically 
delect a data retriever in^lementation in dependence of 
a specific environment. In the example, the data 
retriever inqplexcientatiOTs 111, 112, 113, and 114 are 

10 characterized by their respective algorithms An, A(n + 
I) , A'n, and A» (n i) . 

A result set may potentially contain millions of 
result flags given a sufficiently large number of 
infoCourses loaded. Using the firet or third data 

15 iretriever 111, 113, for example, on a 64 bit 

architecture one pointer address occupies already 64 
Bit (8 Bytes) * Bach result flag has at least two 
pointers plus the content of the result flag. 
Therefore, one result flag may occupy several htindreds 

20 of bytes- In this case, one result set containing some 
millions of result flags occupies m^ory space in the 
range of up to several hundreds of megabytes . ■ This is 
in addition to the memory space occupied by the tree 
structures that also reside in main memory. Further, as 

25 the data storage system using the first or third data 
aretriever 111, 113 processes the reBult sets 
sequentially and checks result flag instance by result 
flag instance to perform AHD/OR combinations, this may 
lead to processing times of several seconds for one 

30 combination when applied to very large result sets 

(e.g., several millions of result flag instances). One 
can apply an appropriate parallelisation to the first 
iiciplementation or use the second implementation to 
overcome these issues « 
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TbB second and f ortli data retriever 
iinplementatione 112, 114 uee bitmaps for result sets. 
Bitmaps are a representation of result sets that 
consismes considerably less memory space tban the result 
5 sets of the first and third data retrievers. Further, 
bit by bit operations are performed very fast and can 
usually be handled in one CPU processing cycle per 
machine word, if supported by a programming language, 
such as Ci-h. 

10 For very large resuat sets, such as several 

millions of bits (e.g., in the interv«a [B(n), B(n - 
1} ] } f it can be more time saving to perform Boolean 
combinations of result sets using bitmaps instead of 
pointer lists of result flag instances. Already the 

15 instantiation of millions of result flag instances may 
last several seconds if parallelisation is not used. In 
addition the time for performing the Boolean 
combination has tc be considered. The Boolean 
combination of result flag instances is performed one 

20 by one at the instance level. 

However, when a result set contains only a small 
nuiober of hits (e.g., in the interval [B(n-l), B{n)]) 
then bitmaps may be ^'almost empty*. That is^ only a 
small number of bits is set to 1 in a large n\imber of 

25 machine words (e.g,, for 5 million InfoCourses a bitmap 
includes 78.125 machine words on a 64 bit platform). In 
a bad case only one bit is set to 1 in each machine 
word. Therefore, for small result sets the use of 
result: flag instances may be advantageous - 

30 within the corresponding intervals in the "number 

of hits" dimension the further performance dependency 
on the complexity of the Boolean statement exists- This 
dependency has already been explained under FIGs. 16 to 
IS. 
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The computer program product components in FIG. 20 
allow the data storage system to switch from one data 
retriever implementation to another if the other 
implementation uses an algorithm that is more 
advantageous in a specific environment. For example, 
this can be achieved by transforming pointer lists of 
the first implementation into bitmaps of the second 
implementation and vice versa- For these 
eransformatlons the above eacplained procedures for 
creating result flag instances and for creating bitmaps 
can be used. 

The data storage system decides by itself when It 
is appropriate to use an implementation that is useful 
for small result sets having a number of result flags 
below a threshold value (e.g«, up to several thousand 
elements) or an implementation having a number of 
result flags above the threshold value. For each of 
these cases the data storage system further decides 
whether the corresponddLng ''Iisan AND" algorithm 
A* (n + 1} or the more general algorithms Rn, A(n ^ 1) 
is advantageously used. 

, This enables the data storage system to 
automatically select the algorithm, which is bast in 
terms of memory consumption and performance in a 
specific situation- If a ctirrent value of one dimension 
equals a threshold value of this dimension, then it is 
not important r whether the algorithm for tlie interval 
above or below the threshold value is used because, 
preferably, the threshold value is defined as the 
breakeven points for the two algorithms. 

In the exaicple^ the threshold value in the »»number 
of hits" dimension may be the break even point being 
defined as the number of result flags in a result set, 
where the use of result flag instances leads to the 
same system performaace as the use of biirmaps. The datia 



0B-JUL-2003 10:^ SflP PG UIRLLDORF 

2003P00397 BP 



+49 6227 7G0251 S. 52/^83 



g^^^jpg^g^ system caja determine tn© threshold valw 
dynamicallYr for exan?)!©, by appropriate time 
measurement©. Therefore, on a given technology platform 
for a given data voliame, data value distribution, etc., 
5 the appropriate value for the threshold value can be 
used in a stable environment at all times. 

The query generator 101 generates the query that 
includes the Boolean expression 500. For example, the 
query generator can be implemented on the front end 
10 con?>uting device 901. Mie query generator 101 can also 
be part of an application that runs on any other 
con^puting device of the CQn^iuter system 990- 

Once the data storage system 902 receives the 
Boolean eaqpression through a corresponding interface, 
15 the result covinter 102 determines the corresponding 
number of hits. Preferably, the result counter is 
implemented in the data storage system 902. 

The threshold evaluator 103 is able to perform 
multidimensional comparisons. In other words, the 
20 thresliold evaluator can con?)are the current nxuuber of 
hits with the intervals [B(n-l), B(n)J and [B(n), 
B(n+1)] and, substantially simultaneously, the 
complexity of the Boolean eacpression with tlie intervals 
[BMn-D , B'(n)3 and [BMn), BMn+1)]. PIG. 2 explains 
25 details about how to initialise the threshold values 
defining the intervals. 

In case the number of hits (result flags) is in 
the interval tB(n-l), B(n)] and the Boolean expression 
is a complex Boolean expression from the interval 
30 [B* (n-l) , B* (n)] , the data storage system uses the 

first data retriever 111. This case is illustrated by 
bold solid connection lines between the threshold 
evaluator and the first data retriever. 

In case the. number of hits (result flags) is in 
35 the interval [B(n}, B(ni-l)] and the Boolean eaqsression 
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Is a coii5)lex Boolean ea^ression from the ixiterval 
[gi (n-1) , B» (a)] , the data storage system uses tJie 
second data retriever 112, Tlxis case is illustrated by 
bold dotted connection lines between the threshold 
5 evaluator and the second data retriever. 

In case the aianber of hits (result flags) is in 
the interval [B{n-1), B(n)] and the Boolean expression 
includes a •■single-condition-Boolean-AHD" eacpression 
from the interval [B-Cn)! B»(n+1)], the data storage 
10 system uses the third data retriever 113. This case is 
illustrated by bold dashed connection lines between the 
threshold evaluator and the second data retriever. 

In case the ntamber of hits (result flags) Is in 
the interval [B(n}, B(n-i-l)] and the Boolean expression 
15 includes a **sin9le' condition-Boolean- AND expression 
from the interval [B* (n) , B«(n-H)l, the data storage 
system uses the forth data retriever 114. This case is 
illustrated by bold dotted-dashed connection lines 
between the threshold evaluator and the second data 
2 0 retriever - 

The retrieval time measuring component 104 of the 
data storage system can measure the time that is 
consumed by either data retriever implementation. 
The threshold calculator 105 can dynamically 
25 determine (re-calculate) the threshold values of the 
various dimensions on the basis of the time 
measurements with respect to the four data retriever 
liiiplemetLtationB . The recalculated threshold values can 
be fed into the threshold evaluator 103 and used for 
30 the next query. 

In general, there can be more threshold values 
that correspond to even more data retrievers for even 
more dimensions. That is, there can be further 
dependencies on further parameters that are considered 
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by the thresliold ©valuator for selecting the 
appropriate data retriever. 

Embodiments of the invention can be iinplemented in 
5 digital electronic circuitry, or in eoxnputer hardware, 
firmware, software, or in conibinationB of them. TSie 
invention can be ixsplesmsnAeA as a computer program 
product, i.e., a computer program tangibly embodied in 
an information carrier, e.g., in a machine -readable 
10 storage device or in a propagated signal, for execution 
by, or to control the operation of, data processing 
apparatus, e.g., a programmable processor, a computer, 
or multiple computers. A computer program can be 
written in any form of programming language, including 
15 compiled or interpreted languages, and it can be 
deployed in any form, including as a stand-alone 
program or as a module, eonqponent, subroutine, or other 
unit suitable for use in a ccanputing environment. A 
computer program can be deployed to be executed on one 
20 coiBExuter or on multiple con^iuters at one site or 
distributed across multiple sites and interconnected by 
a communicaticsn network. 

Itethod st^s of the invention can be performed by 
one or more programmable processors executing a 
25 computer program to perform functions of the invention 
by operating on input data and generating output. 
Method steps can also be performed by, and apparatus of 
the invention can be in^ilemented as, special purpose 
logic circuitry, e.g., an PPcaft. (field programmable gate 
30 array) or an ASIC (application-specific integrated 
circuit) . 

Processors suitable for the execution of a 
computer program include, by way of example, both 
general and special purpose microprocessors, and any 
35 one or more processors of any kind of digital compucer. 
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Generally, a processor will receive instructions and 
data from a read-only memory or a random access memory 
or botlx. The essential elements of a compTiter are at 
least one processor for executing instructions and one 
5 or more memory devices for storing instructions and 
data. Generally, a computer will also include, or be 
operatively coupled to receive data from or transfer 
data to, or both, one or more mass storage devices for 
storing data^ e.g,, magnetic, magneto-optical disks, or 
10 optical disks. Information carriers suitable for 
embodying computer program instructions and data 
include all forms of non-volatile memory, including by 
way o£ example seBiiconductor memory . devices , e.g., 
SFROM, EEPROH, and flash memory devices; magnetic 
15 disks, e.g., internal hard disks or removable disks; 
magneto-optical disks; and CD-ROM and DVD-ROH disks. 
The processor and the memory can be supplemented by, or 
incorporated in special purpose logic circuitry. 

To provide for interaction with a user, the 
20 invention can be implemented on a computer having a 
display device, e,g*^ a cathode ray tube (CRT) or 
liquid crystal display (IiCD) monitor, for displaying 
information to the user and a keyboard and a pointing 
device, e.g., a mouse or a trackball, by which the user 
25 can provide input to the computer. Other kinds of 
devices can be used to provide for interaction with a 
user as well; for example, feedback provided to the 
user can be any form of sensory feedback, e»g., visual 
feedback, auditory feedback, or tactile feedback; and 
30 input from the user can be received in any form, 
including acoustic, speech, or tactile input. 

The invention can be implemented in a confuting 
system that includes a back-end component, e.g., as a 
data eerver, or that includes a middleware component, 
35 e.g., an application server, or that includes a 
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front-eixd component, e.g., a client coinputer having a 
graphical u^er interface or a Web browser through which 
a user can interact with an implementation of the 
invention, or any combination of such back-end, 
5 middleware, or front -end components. The components of 
the system can be interconnected by any form or medium 
of digital data communication, e.g., a canmunication 
network. Examples of communication networka include a 
local area network (IAN) and a wide area network (TOW) , 

10 e.g., the Internet. 

The coznputing system can include clients and 
servers. A client and server are generally remote fzrom 
each other and typically interact through a 
communication network. The relationship of client and 

15 server arises by virtue of coxqputer programs running on 
the respective computers and having a client -server 
relationship to each other. 

Although an embodiment of the invention has been 
20 described in detail using a data storage system having 
a plurality of data retrieval algorithms, the invention 
is not limited to this embodiment. Rather, other 
software applications making use of the spirit of the 
invention as broadly described by the claims are 
25 considered to be within the scope of the invention. 
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Claims 



1. A computer implenieated methpd for automatic 
software tuning canoprising the steps of; 

5 calculating (410) at least one threstiold value for 

at least one parameter (PI) influencing ttte 
performance of a software application (200) 
with regards to a specific task; 
comparing (430) the at least one threshold value to 
10 at least one corresponding current value; and 

selecting (440) an algorithm (Al) from a plurality 
of algorithms (Al to AW) for performing tlie 
task in accordance with the resxilt of the 
cQiq»aring step (430) , 

15 

2, Method of claim 1 comprising the further steps of r 
measuring (450) the performance of the seleoted 

algorithm (Al) ; 
checking (460) whether the measured performance 
20 complies with the at least cme threshold value; 

and 

recalculating (470) the at least one threshold 
value in case of non-conipliance* 

25 3. Method of any one of the previous claims, where the 

at least one threshold value separates the value 

range of the parameter (PI) into at least two 

intervals of a first dimension. 
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4. Method of cladLia 3, wherein the selecting step (440) 
selects the algorithm (Al) that is assigned to the 
interval that includes the corresponding current 
value of the first dimension, 

5 

5. Method of claim 3, where at least one further 
threshold value separates the value range of a 
further parameter into at least two intervals of a 
second dimension. 



10 



15 



6. Method of claim 5, wherein the selecting step (440) 
selects the algorithm (Al) that is assigned to the 
intersection of the interval of the first dimension 
Uhat includes the corresponding current parameter 
value of the first dimension and the interval of 
the second dimension that includes the 
corresponding current parameter value of the second 
dimension. 



20 7. Method of any one of the claims 3 to 6, wherein 
each threshold value corresponds to a break- even 
point where two neighbouring algorithms have the 
same performance with respect to the corresponding 
dimension. 

25 

8. A computer program product for automatic software 
tuning coniprising a plurality of instructions that 
when loaded intio a memory of a congputer system 
(9S0) cause at least one processor of the computer 
30 system (300) to execute the steps of any one of the 

claims 1 to 7. 



35 



9, Information carrier comprising the computer program 
product of claim a. 



i 
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10* A computer program product for dynamically 
selecting a data retriever ixtrplementation for 
retrieving data from a data storage system (902) in 
response to a Boolean expression (500) coxuprising: 
5 a result counter (102) to determine a number of 

hits in response to the Boolean escpression; 
a threshold ©valuator (103) to compare the number 
of hits with a threshold value of a first 
dimension and to compare the complexity o£ the 
10 Boolean expression with a fiirther threshold 

value of a second dimension; 
a first data retriever (111) to retrieve the data 
in case the number of hits is below the 
threshold value of the first dimension and the 
15 complexity of the Boolean expression is above 

the further threshold value of the second 
dimension; 

a second data retriever (1X2) to retrieve the data 
in case the nxmiber of hits is above the 
20 threshold value of the first dimension and the 

complexity of the Boolean expression is above 
the further threshold value of the second 
dimensic^n; 

a third data retriever (113) to retrieve the data 
25 in case the number of hits is below the 

threshold value of the first dimension and the 
complexity of the Boolean expression is below 
the further threshold value of the second 
dimension; and 

30 a forth data retriever (114) to retrieve the data 

in case the number of hits is above the 
threshold value of the first dimension and the 
complexity of the Boolean expression is below 
the further threshold value of the second 

35 dimension. 
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11. «ie coirs.uter prograxa product of claim 10, further 
contprlsing: 

a retrieval time measuring coinponent (104) to 
5 measure the time that is consumed by a selected 

data retriever (111, 112, 113, 114) for various 
nianibers of hits; and 
a threshold calculator (105) to dynamically 
determine the threshold value and the further 
10 threshold value on the basis of the results of 

the retrieval time measuring coinponent (104) 
and to feed back the determined threshold 
values into the threshold evaluator (103) . 

15 12. The coniputer program product according to claim 11, 
^ere the first data retriever (111) is implemented 
by using a general data retrieval algorithm using 
result flag instances. 

20 13, The computer program product according to claim 11 
or 12, where the second data retriever (112) is 
implemented by using a general data retrieval 
algorithm using bit maps. 

25 14. The coniputer program product according to any one 
of the claims 11 to 13, where the third data 
retriever (113) is implemented by using a lean AMD 
data retrieval algorithm uaing result flag 
instances . 



30 



15. The computer program product according to any one 
of the claims 11 to 14, vrtiere the forth data 
retriever (114) is iiis>lemented by using a lean AMD 
data retrieval algorithm using bit maps. 



35 
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16. A computer syBtem (990) comprising: 

a memory to store a contputer program product 
5 according to any one of the claims 10 to 15; 

and 

at least one processor to execute Instructions o£ 
the. computer program product according to any 
one of the claims 10 to 15. 

10 

17. A ccmqputer system (99Q) for running a softudrare 
application (200) comprising: 

variables (210) for storing at least one threshold 
value for at least one parameter (PI) 
15 influencing the performance of the software 

application (200) with regards to a specific 
task; and 

a threshold evaluator (220) for coiqparing (430) the 
at least one threshold value to at least one 
20 corresponding cuxrrent value allowing the 

software application (200) to select (440) an 
algorithm (Al) from a pl\irality of algorithms 
(Al to AN) for performing the tasic in 
accordance with the result of comparison. 

25 

18. The coxoputer system (990) of claim 17, further 
comprising: 

a threshold calculator (23 0) for recalculating 
(470) the at least one threshold value in case 
30 the actual performance of the selected 

algorithm (Al) is non-conpliant with the at 
least one threshold value. 
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19. THe conrputer system (990) of claim 17 or 18, where 
tHe at least one threshold value Beparates the 

5 value range of the parameter (PI) into at least two 

intervals of a first dimension. 

20. «xe computer system (990) of claim 19, wherein the 
selected algorithm (Al) is assigned to the interval 

10 that includes the corresponding current value of 

the first dimension* 

21. The computer system (990) of claim 19, where at 
least one further threshold value separates the 

15 value range of a further parameter into at least 

two intervals of a second dimension, 

22. The computer system (990) of claim 21, wherein the 
selected algorithm (Al) is assigned to the 

20 intersection of the interval of the first dimension 

that that incluAes the corresponding current 
parameter value of the first dimension and the 
interval of the second dimension that that includes 
the corresponding current parameter value of the 

25 second dimension. 

23 • The Gonqputer system (990) of any one of the claims 
1^ to 22, wherein each threshold value corresponds 
to a break-even point t»»fliere two neighboiiring 
30 algorithms have the same performance with respect 

to the corresponding dimension. 
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KETEOP MOD COMP17TBR SYSTEM FOB SOFTWARE "SVBJSB 
Abstract of fcbe laveafcxon 

Metliod and conMputer syBtem for software timing. A 
S computer system stores variables (210) for storing at 
least one threshold value for at least one parsroeter 
(PI) influencing the performance of a software 
application (200) with regards to a specific task. A 
threshold evaluator (220) eonipar^s (430) the at least 
10 one threshold value to at least one corresponding 
current value allowing the software application (200) 
to select (440) an algorithm (Al) from a plurality of 
algorithms (Al to AN) for performing the task in 
accordance with the result o£ comparison. 

15 

FIG- 1 



08-JUL-2003 10S29 SAP RG UlflLLDORF +49 6227 760251 



1/20 & 

CO 

fo 




08-JUL-20a3 10 : 29 



SAP AG WRLLDOra^ 



+49 6227 760251 S.es;^ 




0S-JUL-2003 10:29 SfiP AG WflLLDORF +49 6227 760251 S.SS/^ 




CO 

O 

LL. 



08-JUL-2003 10:29 



SPP AG WflLLDDRF 



+49 6227 7S0251 S.GS^BS 



5/20 



111 



CO 





0a-JLIL-2003 10:29 



SPP m UPLLDORF 



+49 6227 760251 5.69/^ 




08"JUL-2003 10:29 



SflP RG WRLLDORF 



+45 6227 760251 S.7Q^ 





oo 



08-JUL-2003 10S30 SflP RG WflLLDORF * +49 6227 760251 5.72^83 



9/20 



8 




ss 



0B-JUL-2003 10:34 



SPP AG WRLLDORF 



449 6227 7G0251 S.TS/'BS 




08- JUL-2003 10 : 34 



SAP RG WflLLDORF 



+49 6227 760251 S. 76/83 




0S-JUL-2003 10:34 SflP PG WRLLDORF 



+49 6227 7S0251 S.77/«3 



14/20 



111 



o- 



?2 

CO 



to 



UJ 
CO 



CO 



<P2 






CM 




o 




3 

UL. 









CM 







% 






^ 




1 


H»> 






R-F 



















i 






— 


5 





C9 



<3 



08-JUL-2003 10S35 



SfiP FIG URLLDDRF 



+49 6327 760251 5.79/83 




10535 



SflP m WRLLDORF 



+49 S227 5.80/83 




08-JUL-2003 10:35 



SRP m UlflLLDORF 



+49 S227 760251 



S. 81/83 




08-JUL-2033 10S35 



SflP AG WflLLDORF 



+49 6227 760^1 



S.82/^ 



19/20 



CD 
CO 



CM 



+ 

c 

I 



9 

e 



I 

CO 



CD 

9» 



CO 





08-JUL-2003 10S35 SAP fiG UIRLLDORF +49 62Z? S. 83/^83 




EPO- Munich 

37 

^,20 oajuii2oiij 



CO 
CM 





CM 
O 




CO 

O 

IX. 



/20 




o. 

U4 



o. 

CO 



CM 



5/20 



lU 

1 

CO 



CM 




ca 




to 




oo 

S2 



12/20 



O- 
LU 



CO 



£ 

CO 




CM 

d 



13/20 



lU. 

CO 



CO 
CM 



CO 
CO 




CM 

CD 



o 



CM 

% 




CO 
LU 

on 













i 








C-FL 




C-Fl 



T 



CO 
CO 



r 





on 






o 










o 


ON 




LU 


<■ 




CO 


o 




on 


V 




no 




o 




rU 









I 



on 
o 



■ 

T 



CO 



T 



I 





ICHOR ^ 






<F 
o 






V 






CM 












V 



CO 



CO 



CM 
CO 





ICHOR H 












\ 






CO 






o 






V 



CO 

« CO 



14/20 




1^ 



CO 



ICM 



R-FLAG 












LAG 




R-FI 




R-F 








CM 




CM 












LL. 





u. 


■ 

o 




■ 




15/20 




CO 



a. 

CO 



CM 



1 




t 



CM 








1 








■ 




o 





CM 
O 

3 








1 












uu 

• 



















i 




1 








UU 




U. 
O 




■ 



I 



I 



I 



I 



lOR 




10R 




10R 




rtOR 


o 




o 




o 




o 


IC-AK 




lc-A^ 




ic-Ar 






T 




V 




V 




\ 



CO 



CO 



CM 

To 



CO 
5> 



O 



17/20 



IB 



ID 



(0 

9 














o 



19/20 S 




£2 




CO 





a: 




LU 
















o 




o 
















CO 




IXJ 




an 




CM 






1 his Page is inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the apphcant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
Q^ADED TEXT OR DRAWING 
KJ^LURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

^REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: ; 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



